Text localization in color documents
نویسندگان
چکیده
A new method for text localization in cover color pages and general color document images is presented. The colors of the document image are reduced to a small number using a color reduction technique based on a Kohonen Self Organized Map (KSOM) neural network. Each color defines a color plane in which the connected components (CCs) are extracted. In each color plane a CC filtering procedure is applied which is followed by a local grouping procedure. At the end of this stage, groups of CCs are constructed which are next refined by obtaining the Direction Of Connection (DOC) property for each CC. Using the DOC property, the groups of CCs are classified as text or non text regions. Finally, text regions identified in the different color planes are superimposed and the final text localization of the entire document is achieved. The proposed technique was extensively tested with a large number of color documents.
منابع مشابه
Natural scene text localization using edge color signature
Localizing text regions in images taken from natural scenes is one of the challenging problems dueto variations in font, size, color and orientation of text. In this paper, we introduce a new concept socalled Edge Color Signature for localizing text regions in an image. This method is able to localizeboth Farsi and English texts. In the proposed method rst a pyramid using diff...
متن کاملA New Apporach to Optical Character Recognition Based on Text Recognition in Ocr
Optical Character Recognition (OCR) is a technology that enable of you to convert different types of documents, such as scanned paper documents, either hand written or machine printed script, PDF files or images captured by a digital camera into editable and searchable data. Our intention is to build an automatic text localization and extraction system which is able to accept different types of...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملPii: S0031-3203(01)00167-4
Text extraction in mixed-type documents is a pre-processing and necessary stage for many document applications. In mixed-type color documents, text, drawings and graphics appear with millions of di0erent colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method to automatically detect and extract text in mixed-type color documents is presented. The ...
متن کاملLocating text in color documents
In complex color documents, text, drawings and graphics are appeared with millions of different colors. In many cases, text regions are overlaid onto drawings or graphics. In this paper, a new method is proposed to automatically detect and extract text in mixed type color documents. The proposed method is based on a combination of an Adaptive Color Reduction (ACR) technique and a Page Layout An...
متن کامل